NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

HOI-Swap: Swapping Objects in Videos with Hand-Object Interaction Awareness

Xue, Zihui; Luo, Mi; Chen, Changan; Grauman, Kristen (November 2024, https://doi.org/10.48550/arXiv.2406.07754)

This paper addresses the challenge of precisely swapping objects in videos, particularly those involved in hand-object interactions (HOI), using a single user-provided reference object image. While diffusion models have advanced video editing, they struggle with the complexities of HOI, often failing to generate realistic edits when object swaps involve changes in shape or functionality. To overcome this, the authors propose HOI-Swap, a novel diffusion-based video editing framework trained in a self-supervised manner. The framework operates in two stages: (1) single-frame object swapping with HOI awareness, where the model learns to adjust interaction patterns (e.g., hand grasp) based on object property changes; and (2) sequence-wide extension, where motion alignment is achieved by warping a sequence from the edited frame using sampled motion points and conditioning generation on the warped sequence. Extensive qualitative and quantitative evaluations demonstrate that HOI-Swap significantly outperforms prior methods, producing high-quality, realistic HOI video edits.
more » « less
Full Text Available
ActiveRIR: Active Audio-Visual Exploration for Acoustic Environment Modeling

Somayazulu, Arjun; Majumder, Sagnik; Chen, Changan; Grauman, Kristen (August 2024, International Conference on Intelligent Robots and Systems (IROS))

Full Text Available
Sim2Real Transfer for Audio-Visual Navigation with Frequency-Adaptive Acoustic Field Prediction

Chen, Changan; Ramos, Jordi; Tomar, Anshul; Grauman, Kristen (August 2024, International Conference on Intelligent Robots and Systems (IROS))

Full Text Available
Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos

Chen, Changan; Peng, Puyuan; Baid, Ami; Xue, Zihui; Hsu, Wei-Ning; Harwath, David; Grauman, Kristen (July 2024, https://doi.org/10.48550/arXiv.2406.09272)

Generating realistic audio for human actions is critical for applications such as film sound effects and virtual reality games. Existing methods assume complete correspondence between video and audio during training, but in real-world settings, many sounds occur off-screen or weakly correspond to visuals, leading to uncontrolled ambient sounds or hallucinations at test time. This paper introduces AV-LDM, a novel ambient-aware audio generation model that disentangles foreground action sounds from ambient background noise in in-the-wild training videos. The approach leverages a retrieval-augmented generation framework to synthesize audio that aligns both semantically and temporally with the visual input. Trained and evaluated on Ego4D and EPIC-KITCHENS datasets, along with the newly introduced Ego4D-Sounds dataset (1.2M curated clips with action-audio correspondence), the model outperforms prior methods, enables controllable ambient sound generation, and shows promise for generalization to synthetic video game clips. This work is the first to emphasize faithful video-to-audio generation focused on observed visual content despite noisy, uncurated training data.
more » « less
Full Text Available
Learning Audio-Visual Dereverberation

https://doi.org/10.1109/ICASSP49357.2023.10095818

Chen, Changan; Sun, Wei; Harwath, David; Grauman, Kristen (June 2023, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP))

Full Text Available
LEARNING AUDIO-VISUAL DEREVERBERATION

Chen, Changan; Sun, Wei; Harwath, David; Grauman, Kristen (January 2023, ICASSP)

Full Text Available
Few-Shot Audio-Visual Learning of Environment Acoustics

Majumder, Sagnik; Chen, Changan; Al-Halah, Ziad; Grauman, Kristen (January 2022, Advances in neural information processing systems)

Full Text Available
Semantic Audio-Visual Navigation

Chen, Changan; Al-Halah, Ziad; Grauman, Kristen (January 2021, IEEE Conference on Computer Vision and Pattern Recognition)

Full Text Available

Search for: All records